41 research outputs found

    Dynamically Reconfigurable Systems-on-Chip

    Get PDF
    The design space for dynamically reconfigurable SoCs can be seen in three dimensions: 1) the system architecture for computation and communication, ranging from dataflow-oriented dedicated logic blocks to instruction flow-oriented microprocessor cores, from dedicated point-to-point connections to Networks-on-Chip. 2) the granularity of reconfigurable elements, ranging from simple logic Look-Up-Tables to complex hardware accelerator engines and reconfigurable interconnect structures. 3) the configuration life cycle, ranging from application changes (in the order of seconds) to instruction-based reconfiguration (in the order of nanoseconds). We propose to use dynamically reconfigurable computing for video processing in driver assistance applications. In future automotive systems, video-based driver assistance will improve security. Video processing for driver assistance requires real time implementation of complex algorithms. A pure software implementation, based on low cost embedded CPUs in automotive environments, does not offer the required real time processing. Therefore hardware acceleration is necessary. Dedicated hardware circuits (ASICs) can offer the required real time processing, but they do not offer the necessary flexibility. Specific driving conditions, e.g. highway, country side, urban traffic, tunnel, require specific optimized algorithms. Reconfigurable hardware offers high potential for real time video processing and adaptability to various driving conditions. Our system architecture consists of embedded CPU cores for high-level application code, dedicated hardware accelerator engines for low level pixel processing, and an application-specific memory system. The hardware accelerators and the memory system are dynamically reconfigurable, i.e. hardware accelerator engines can be exchanged during runtime, controlled by the application code on the CPU. The life cycle of a configuration depends on the change of driving conditions. A requirement on the reconfiguration time is given by the frame rate of the video signal, e.g. 40 msec for the exchange and relocation of new engines

    Adaptive tracking of people and vehicles using mobile platforms

    Get PDF
    Tracking algorithms have important applications in detection of humans and vehicles for border security and other areas. For large-scale deployment of such algorithms, it is critical to provide methods for their cost- and energy-efficient realization. To this end, commodity mobile devices have significant potential for use as prototyping and testing platforms due to their low cost, widespread availability, and integration of advanced communications, sensing, and processing features. Prototypes developed on mobile platforms can be tested, fine-tuned, and demonstrated in the field and then provide reference implementations for application-specific disposable sensor node implementations that are targeted for deployment. In this paper, we develop a novel, adaptive tracking system that is optimized for energy-efficient, real-time operation on off-the-shelf mobile platforms. Our tracking system applies principles of dynamic data-driven application systems (DDDAS) to periodically monitor system operating characteristics and apply these measurements to dynamically adapt the specific classifier configurations that the system employs. Our resulting adaptive approach enables powerful optimization of trade-offs among energy consumption, real-time performance, and tracking accuracy based on time-varying changes in operational characteristics. Through experiments employing an Android-based tablet platform, we demonstrate the efficiency of our proposed tracking system design for multimode detection of human and vehicle targets.publishedVersionPeer reviewe

    Mind the Scaling Factors: Resilience Analysis of Quantized Adversarially Robust CNNs

    Get PDF
    As more deep learning algorithms enter safety-critical application domains, the importance of analyzing their resilience against hardware faults cannot be overstated. Most existing works focus on bit-flips in memory, fewer focus on compute errors, and almost none study the effect of hardware faults on adversarially trained convolutional neural networks (CNNs). In this work, we show that adversarially trained CNNs are more susceptible to failure due to hardware errors when compared to vanilla-trained models. We identify large differences in the quantization scaling factors of the CNNs which are resilient to hardware faults and those which are not. As adversarially trained CNNs learn robustness against input attack perturbations, their internal weight and activation distributions open a backdoor for injecting large magnitude hardware faults. We propose a simple weight decay remedy for adversarially trained models to maintain adversarial robustness and hardware resilience in the same CNN. We improve the fault resilience of an adversarially trained ResNet56 by 25% for large-scale bit-flip benchmarks on activation data while gaining slightly improved accuracy and adversarial robustness

    Autonomous Robotic Screening of Tubular Structures based only on Real-Time Ultrasound Imaging Feedback

    Full text link
    Ultrasound (US) imaging is widely employed for diagnosis and staging of peripheral vascular diseases (PVD), mainly due to its high availability and the fact it does not emit radiation. However, high inter-operator variability and a lack of repeatability of US image acquisition hinder the implementation of extensive screening programs. To address this challenge, we propose an end-to-end workflow for automatic robotic US screening of tubular structures using only the real-time US imaging feedback. We first train a U-Net for real-time segmentation of the vascular structure from cross-sectional US images. Then, we represent the detected vascular structure as a 3D point cloud and use it to estimate the longitudinal axis of the target tubular structure and its mean radius by solving a constrained non-linear optimization problem. Iterating the previous processes, the US probe is automatically aligned to the orientation normal to the target tubular tissue and adjusted online to center the tracked tissue based on the spatial calibration. The real-time segmentation result is evaluated both on a phantom and in-vivo on brachial arteries of volunteers. In addition, the whole process is validated both in simulation and physical phantoms. The mean absolute radius error and orientation error (±\pm SD) in the simulation are 1.16±0.1 mm1.16\pm0.1~mm and 2.7±3.3∘2.7\pm3.3^{\circ}, respectively. On a gel phantom, these errors are 1.95±2.02 mm1.95\pm2.02~mm and 3.3±2.4∘3.3\pm2.4^{\circ}. This shows that the method is able to automatically screen tubular tissues with an optimal probe orientation (i.e. normal to the vessel) and at the same to accurately estimate the mean radius, both in real-time.Comment: Accepted for publication in IEEE Transactions on Industrial Electronics Video: https://www.youtube.com/watch?v=VAaNZL0I5i

    HW-Flow-Fusion: Inter-Layer Scheduling for Convolutional Neural Network Accelerators with Dataflow Architectures

    Get PDF
    Energy and throughput efficient acceleration of convolutional neural networks (CNN) on devices with a strict power budget is achieved by leveraging different scheduling techniques to minimize data movement and maximize data reuse. Several dataflow mapping frameworks have been developed to explore the optimal scheduling of CNN layers on reconfigurable accelerators. However, previous works usually optimize each layer singularly, without leveraging the data reuse between the layers of CNNs. In this work, we present an analytical model to achieve efficient data reuse by searching for efficient scheduling of communication and computation across layers. We call this inter-layer scheduling framework HW-Flow-Fusion, as we explore the fused map-space of multiple layers sharing the available resources of the same accelerator, investigating the constraints and trade-offs of mapping the execution of multiple workloads with data dependencies. We propose a memory-efficient data reuse model, tiling, and resource partitioning strategies to fuse multiple layers without recomputation. Compared to standard single-layer scheduling, inter-layer scheduling can reduce the communication volume by 51% and 53% for selected VGG16-E and ResNet18 layers on a spatial array accelerator, and reduce the latency by 39% and 34% respectively, while also increasing the computation to communication ratio which improves the memory bandwidth efficiency

    HW-Flow: A Multi-Abstraction Level HW-CNN Codesign Pruning Methodology

    Get PDF
    Convolutional neural networks (CNNs) have produced unprecedented accuracy for many computer vision problems in the recent past. In power and compute-constrained embedded platforms, deploying modern CNNs can present many challenges. Most CNN architectures do not run in real-time due to the high number of computational operations involved during the inference phase. This emphasizes the role of CNN optimization techniques in early design space exploration. To estimate their efficacy in satisfying the target constraints, existing techniques are either hardware (HW) agnostic, pseudo-HW-aware by considering parameter and operation counts, or HW-aware through inflexible hardware-in-the-loop (HIL) setups. In this work, we introduce HW-Flow, a framework for optimizing and exploring CNN models based on three levels of hardware abstraction: Coarse, Mid and Fine. Through these levels, CNN design and optimization can be iteratively refined towards efficient execution on the target hardware platform. We present HW-Flow in the context of CNN pruning by augmenting a reinforcement learning agent with key metrics to understand the influence of its pruning actions on the inference hardware. With 2× reduction in energy and latency, we prune ResNet56, ResNet50, and DeepLabv3 with minimal accuracy degradation on the CIFAR-10, ImageNet, and CityScapes datasets, respectively
    corecore